Limitations of Current Grammar Induction Algorithms
نویسنده
چکیده
I review a number of grammar induction algorithms (ABL, Emile, Adios), and test them on the Eindhoven corpus, resulting in disappointing results, compared to the usually tested corpora (ATIS, OVIS). Also, I show that using neither POS-tags induced from Biemann’s unsupervised POS-tagging algorithm nor hand-corrected POS-tags as input improves this situation. Last, I argue for the development of entirely incremental grammar induction algorithms instead of the approaches of the systems discussed before.
منابع مشابه
Probing the Linguistic Strengths and Limitations of Unsupervised Grammar Induction
Work in grammar induction should help shed light on the amount of syntactic structure that is discoverable from raw word or tag sequences. But since most current grammar induction algorithms produce unlabeled dependencies, it is difficult to analyze what types of constructions these algorithms can or cannot capture, and, therefore, to identify where additional supervision may be necessary. This...
متن کاملA Greedy Approach to Unsupervised Grammar Induction for Filipino
Copyright 2008 ABSTRACT This paper discusses the Greedy Merge Model used for an unsupervised grammar induction system for the Filipino language. The approach attempts to address the current state of Philippine linguistic resources, specifically the formal grammars, which are insubstantial for robust analysis. The Greedy Merge Model results show an F1 measure of 69%. Generated grammar rules are ...
متن کاملTime series anomaly discovery with grammar-based compression
The problem of anomaly detection in time series has recently received much attention. However, many existing techniques require the user to provide the length of a potential anomaly, which is often unreasonable for real-world problems. In addition, they are also often built upon computing costly distance functions – a procedure that may account for up to 99% of an algorithm’s computation time. ...
متن کاملTowards High Speed Grammar Induction on Large Text Corpora
In this paper we describe an e cient and scalable implementation for grammar induction based on the EMILE approach ([2], [3],[4], [5], [6]). The current EMILE 4.1 implementation ([11]) is one of the rst e cient grammar induction algorithms that work on free text. Although EMILE 4.1 is far from perfect, it enables researchers to do empirical grammar induction research on various types of corpora...
متن کاملیک مدل بیزی برای استخراج باناظر گرامر زبان طبیعی
In this paper, we show that the problem of grammar induction could be modeled as a combination of several model selection problems. We use the infinite generalization of a Bayesian model of cognition to solve each model selection problem in our grammar induction model. This Bayesian model is capable of solving model selection problems, consistent with human cognition. We also show that using th...
متن کامل